Skip to content

test: stop leaked-spy cascades across the whole test runner#590

Open
ndycode wants to merge 3 commits into
mainfrom
claude/audit-71-storage-spy-hygiene
Open

test: stop leaked-spy cascades across the whole test runner#590
ndycode wants to merge 3 commits into
mainfrom
claude/audit-71-storage-spy-hygiene

Conversation

@ndycode

@ndycode ndycode commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Summary

What Changed

The failure mechanics, observed live in sandboxed environments where the EACCES-baseline tests die mid-run:

  1. An early test spies a method (e.g. fs.rename) and crashes before reaching its mockRestore() (no try/finally).
  2. A later test calls vi.spyOn on the same method — vitest returns the same leaked spy rather than wrapping again.
  3. That test's "original" passthrough binding (fs.rename.bind(fs)) therefore captures the spy itself, and calling through recurses into the test's own mock — a RangeError: Maximum call stack size exceeded crash with a confusing stack.

Two layers land here:

  • test/storage.test.ts (first commit): the file's top-level afterEach restored env vars but never mocks; it now calls vi.restoreAllMocks(), with a comment documenting the mechanism and (per review) its relationship to the global hook. The storage suite is where the cascade was observed.
  • test/helpers/global-sandbox.ts (second commit): a repo scan found 16 more test files with the same shape (bare mockRestore() at the end of a test body, no global restore hook). Rather than patching each, the shared setup file — already loaded by every suite — registers one global afterEach(() => vi.restoreAllMocks()), containing the cascade everywhere. No suite creates spies in beforeAll (verified by scan), so an after-each restore cannot strand intentional cross-test spies.

No assertions change anywhere; the per-test mockRestore() calls remain as harmless redundancy.

Validation

  • npm run typecheck + npx eslint on both files (pre-commit hook re-runs both)
  • npm test -- test/storage.test.ts with the file-level hook — failure names a strict subset (19) of the 23-name environment baseline block, zero new names
  • Scan proof for the global hook: beforeAll + spyOn co-occurrence search over test/ returns nothing, so no suite relies on a spy surviving past a test
  • Full-suite run with the global hook (all 317 files, --maxWorkers=1): 54 failing names, all in the 58-name docs/audits/evidence/test-baseline-2026-06-10.txt environment baseline, zero new names — and four baseline names now PASS (the storage rotation/backup/artifact tests that were cascade victims, not genuine environment failures). Strict improvement; full name-diff posted as a PR comment.
  • npm run build deferred to CI

Docs and Governance Checklist

  • No user-visible behavior changed; test isolation only

Risk and Rollback

  • Risk level: low — vi.restoreAllMocks() only affects vi.spyOn spies, which every suite already restores per-test on the success path; the hooks merely cover the failure path. The one theoretical hazard (a suite depending on a beforeAll spy across tests) is excluded by scan.
  • Rollback plan: revert either commit independently; they are belt-and-braces layers of the same defense.

https://claude.ai/code/session_01XNtnkLbBiXZxfQQYLMpucB

A test in this file that fails before its inline mockRestore() leaks
its fs spy into every later test: a later vi.spyOn on the same method
returns the SAME leaked spy, so a passthrough binding captured from it
recurses into the new test's own mock (a Maximum call stack crash
observed in sandboxed environments where the EACCES baseline tests die
mid-run). The top-level afterEach restored env vars but never mocks;
vi.restoreAllMocks() there guarantees one failure cannot cascade.

No assertion changes; the per-test mockRestore() calls remain as
harmless redundancy.

https://claude.ai/code/session_01XNtnkLbBiXZxfQQYLMpucB
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

Test infrastructure adds global and per-test-file afterEach hooks that call vi.restoreAllMocks() after each test to prevent leaked mock spies from affecting later test runs.

Changes

Vitest Mock Spy Cleanup

Layer / File(s) Summary
Mock restoration in afterEach hooks
test/helpers/global-sandbox.ts, test/storage.test.ts
Global sandbox helper registers a suite-level afterEach hook that restores all mocks, and test/storage.test.ts adds explicit unconditional vi.restoreAllMocks() in its own afterEach cleanup to prevent spy leakage between tests.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Possibly related PRs

  • ndycode/codex-multi-auth#585: Both PRs touch test/storage.test.ts to add/guard vi.restoreAllMocks() cleanup to prevent leaked fs.rename spies from affecting later tests.
  • ndycode/codex-multi-auth#321: Both PRs adjust Vitest mock/spies lifecycle to prevent cross-test leakage (main adds vi.restoreAllMocks() in afterEach, retrieved strengthens CLI test isolation by fully resetting/hardening mocks before each test).

Suggested labels

bug

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed title follows conventional commits format with 'test' type, includes scope context about spy cascades, and is 59 chars under the 72-char limit.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed pr description is thorough and addresses all template sections with clear technical detail on the cross-test contamination issue and validation evidence.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/audit-71-storage-spy-hygiene
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch claude/audit-71-storage-spy-hygiene

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Extends the storage-suite fix to the whole runner: a scan found 16 more
test files with the same vulnerability (bare mockRestore() at the end
of a test body, no global restore hook), where one mid-test failure
leaks a spy that a later vi.spyOn on the same method returns verbatim,
sending passthrough bindings into infinite recursion. No suite creates
spies in beforeAll, so an after-each restoreAllMocks in the shared
global-sandbox setup file is safe and contains the cascade everywhere.

https://claude.ai/code/session_01XNtnkLbBiXZxfQQYLMpucB
@ndycode ndycode changed the title test: restore all mocks in storage suite afterEach to stop spy cascades test: stop leaked-spy cascades across the whole test runner Jun 11, 2026
ndycode added a commit that referenced this pull request Jun 11, 2026
docs: include token-refresh in the rotation helper cluster description

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/storage.test.ts`:
- Around line 56-60: The test file contains a redundant per-suite call to
vi.restoreAllMocks() that is already handled by the global afterEach hook in
test/helpers/global-sandbox.ts; remove the vi.restoreAllMocks() call from
storage.test.ts to avoid maintenance noise, or if you prefer to keep the
defensive call, add a one-line comment next to vi.restoreAllMocks() referencing
the global afterEach in test/helpers/global-sandbox.ts (and its line range) so
future maintainers understand it is redundant but intentional.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: be64b33f-4483-4090-ab76-e77572afabb2

📥 Commits

Reviewing files that changed from the base of the PR and between 1d6c3e0 and 7204596.

📒 Files selected for processing (2)
  • test/helpers/global-sandbox.ts
  • test/storage.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (8)
**/*.{ts,js,mts,cjs}

📄 CodeRabbit inference engine (AGENTS.md)

Use ESM only with "type": "module" in package.json; Node >= 18.17 required

Files:

  • test/helpers/global-sandbox.ts
  • test/storage.test.ts
**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

Do not use as any, @ts-ignore, or @ts-expect-error TypeScript assertions

Files:

  • test/helpers/global-sandbox.ts
  • test/storage.test.ts
**/*.{ts,js}

📄 CodeRabbit inference engine (AGENTS.md)

OAuth callback server uses port 1455; do not hardcode OAuth ports—use existing constants/helpers

Files:

  • test/helpers/global-sandbox.ts
  • test/storage.test.ts
**/*.{js,ts}

📄 CodeRabbit inference engine (README.md)

**/*.{js,ts}: Use JSON format for machine-readable output in diagnostic and reporting commands (status, check, report, monitor, why-selected)
Implement interactive terminal dashboard with hotkeys (Up/Down for navigation, Enter for selection, 1-9 for quick switch, / for search, ? for help, Q for back)
Support device authentication flow via --device-auth flag and manual OAuth callback paste fallback via --manual flag for headless environments
Run npm version check during normal forwarded Codex startup and print upgrade notices only on interactive TTY or when CODEX_MULTI_AUTH_DEBUG=1 is set

Files:

  • test/helpers/global-sandbox.ts
  • test/storage.test.ts
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/helpers/global-sandbox.ts
  • test/storage.test.ts
**

⚙️ CodeRabbit configuration file

**: # PROJECT KNOWLEDGE BASE

Generated: 2026-04-25
Commit: a87e005
Validated: 2026-06-10 against commit 98d9819 (repo audit; claims re-checked against the tree, content not regenerated)
Branch: main
Package version: 2.3.0-beta.2

OVERVIEW

codex-multi-auth is a Codex CLI-first OAuth account manager and optional forwarding wrapper for the official Codex CLI. The installed codex-multi-auth entrypoint handles account-management commands locally, codex-multi-auth-codex forwards official Codex commands through this package's wrapper when explicitly used, and runtime rotation can route live Responses traffic through a localhost account-rotation proxy by default. The plugin-host entrypoint remains exported for compatibility, but the primary product surface is the account manager, optional wrapper, storage, runtime proxy, and repair tooling.

STRUCTURE

./
├── scripts/
│   ├── codex.js              # codex-multi-auth-codex wrapper, official CLI forwarder, shadow CODEX_HOME/runtime proxy setup
│   ├── codex-multi-auth.js   # standalone package CLI entrypoint
│   ├── codex-routing.js      # auth command and compatibility alias routing
│   ├── codex-bin-resolver.js # official Codex binary discovery
│   ├── codex-app-router.js   # persistent localhost router for packaged Codex app bind
│   └── codex-app-launcher.js # reversible user-level app launcher routing helper
├── index.ts                  # optional plugin-host runtime entry
├── lib/                      # core runtime logic (see lib/AGENTS.md)
│   ├── auth/                 # OAuth flow, PKCE, callback server
│   ├── runtime/              # Codex CLI/app integration helpers, app bind, live sync, runtime observability
│   ├── request/              # request transform, SSE, failover, backoff
│   ├── storage/              # path resolution, migrations, backups, restore, import/export
│   ├── codex-cli/            # Codex CLI state sync and writer helpers
│   ├── codex-manager/        # command modules and...

Files:

  • test/helpers/global-sandbox.ts
  • test/storage.test.ts
test/**/*.test.ts

📄 CodeRabbit inference engine (test/AGENTS.md)

test/**/*.test.ts: Write Vitest test suites with globals enabled (describe, it, expect)
Maintain 80%+ coverage threshold across statements, branches, functions, and lines
Use removeWithRetry() for Windows filesystem cleanup instead of bare fs.rm to handle EBUSY, EPERM, and ENOTEMPTY errors
Do not rely on dist/ in tests; use source files instead
Do not skip tests without justification
Relax lint rules for test files as configured in eslint.config.js

Files:

  • test/storage.test.ts
test/**/storage*.test.ts

📄 CodeRabbit inference engine (test/AGENTS.md)

Test V3 storage, worktree migration, and concurrent load scenarios in storage.test.ts and storage-async.test.ts

Files:

  • test/storage.test.ts
🧠 Learnings (2)
📚 Learning: 2026-06-04T06:14:18.093Z
Learnt from: ndycode
Repo: ndycode/codex-multi-auth PR: 510
File: test/scheduling-strategy-config.test.ts:1-1
Timestamp: 2026-06-04T06:14:18.093Z
Learning: In ndycode/codex-multi-auth, do not flag explicit imports from "vitest" (e.g., describe, it, expect, beforeEach/afterEach, etc.) in test files as issues—even if the Vitest config sets `globals: true`. The repo’s established convention is to keep these imports for consistency with neighboring tests; removing them would make files outliers.

Applied to files:

  • test/storage.test.ts
📚 Learning: 2026-06-04T06:14:24.975Z
Learnt from: ndycode
Repo: ndycode/codex-multi-auth PR: 510
File: test/runtime-rotation-proxy.test.ts:2478-2491
Timestamp: 2026-06-04T06:14:24.975Z
Learning: In ndycode/codex-multi-auth test files (e.g. `test/*.test.ts`), when creating V3 storage fixtures for accounts, it’s an intentional convention to use `as never` for deliberately minimal stored-account objects that only include `refreshToken`, `addedAt`, and `lastUsed`. Do not treat `as never` here as a type-safety problem: optional/other fields are expected to be populated by the runtime during execution, and the cast is used solely to keep the fixture minimal and consistent across existing tests.

Applied to files:

  • test/storage.test.ts
🔇 Additional comments (1)
test/helpers/global-sandbox.ts (1)

32-41: LGTM!

Comment thread test/storage.test.ts

ndycode commented Jun 11, 2026

Copy link
Copy Markdown
Owner Author

Full-suite validation of the global hook is in (npx vitest run --maxWorkers=1 on this branch, all 317 files):

  • 54 failing test names, every one of them in the §7 environment baseline — zero new names. The suite-wide afterEach(vi.restoreAllMocks) changes no test outcome anywhere outside the fix's target.
  • Four baseline names now pass that previously failed: keeps rotating backup order deterministic across parallel saves, preserves historical backups when creating the latest backup fails, removes primary, backup, and wal artifacts, and rotates backups and retains historical snapshots. These are exactly the storage-suite cascade victims — they were never genuine environment failures, they were collateral of the spy leaked by the EACCES-failing test ahead of them. The baseline shrinks from 58 to 54.

So the change is a strict improvement: file-level failures stay at the known 8 environment files, and the only name-level deltas are recoveries. Validation evidence retained at /tmp/global-restore-run.txt name-diff against docs/audits/evidence/test-baseline-2026-06-10.txt.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants